Bacteroidia and Clostridia are equipped to degrade a cascade of polysaccharides along the hindgut of the herbivorous fish Kyphosus sydneyanus

Abstract The gut microbiota of the marine herbivorous fish Kyphosus sydneyanus are thought to play an important role in host nutrition by supplying short-chain fatty acids (SCFAs) through fermentation of dietary red and brown macroalgae. Here, using 645 metagenome-assembled genomes (MAGs) from wild fish, we determined the capacity of different bacterial taxa to degrade seaweed carbohydrates along the gut. Most bacteria (99%) were unclassified at the species level. Gut communities and CAZyme-related transcriptional activity were dominated by Bacteroidia and Clostridia. Both classes possess genes CAZymes acting on internal polysaccharide bonds, suggesting their role initiating glycan depolymerization, followed by rarer Gammaproteobacteria and Verrucomicrobiae. Results indicate that Bacteroidia utilize substrates in both brown and red algae, whereas other taxa, namely, Clostridia, Bacilli, and Verrucomicrobiae, utilize mainly brown algae. Bacteroidia had the highest CAZyme gene densities overall, and Alistipes were especially enriched in CAZyme gene clusters (n = 73 versus just 62 distributed across all other taxa), pointing to an enhanced capacity for macroalgal polysaccharide utilization (e.g., alginate, laminarin, and sulfated polysaccharides). Pairwise correlations of MAG relative abundances and encoded CAZyme compositions provide evidence of potential inter-species collaborations. Co-abundant MAGs exhibited complementary degradative capacities for specific substrates, and flexibility in their capacity to source carbon (e.g., glucose- or galactose-rich glycans), possibly facilitating coexistence via niche partitioning. Results indicate the potential for collaborative microbial carbohydrate metabolism in the K. sydneyanus gut, that a greater variety of taxa contribute to the breakdown of brown versus red dietary algae, and that Bacteroidia encompass specialized macroalgae degraders.

refined MAG completeness and contamination [14].Mapped read counts were obtained using the BBTools 'pileup.sh'script [15] from Bowtie sam files [7].Counts were normalized by library size and genome length to estimate the genome coverage (and hence relative abundance) of MAGs across samples.
Subsequent 2020 fish collection and omics data generation Six K. sydneyanus individuals were collected from waters around Great Barrier Island, Auckland, New Zealand, in January 2020 (approval 001949 from the University of Auckland Animal Ethics Committee).Gut contents from sections III, IV and V were homogenized, aliquoted into microtubes and immediately stored in liquid nitrogen until storage in a -80°C freezer.Prior to extraction, samples were thawed and centrifuged at 15000 x g for 2 min and the supernatant removed.The leftover pellet was reconstituted using solution CD1 from the PowerSoil Pro kit (QIAGEN, Germantown, MD, United States), and DNA was extracted following manufacturer instructions.Extracted DNA was purified using Genomic DNA Clean & Concentrator-10 (Zymo Research, Irvine, California, United States).
DNA libraries were prepared at the Otago Genomics Facility (University of Otago, Dunedin, New Zealand) with the Thruplex DNA-Seq 96D kit (Takara Bio, Kusatsu, Shiga, Japan) for 18 high molecular DNA samples from sections III, IV and V of the six fish.Paired 125 bp reads were generated using the HiSeq V4 platform (Illumina, San Diego, CA, United States).
Newly generated MAGs from six K. sydneyanus individuals were combined with 197 generated in this study (Table S2).All MAGs were checked for completeness and contamination using CheckM version 1.2.1 [14].Only genomes >75% complete and with <5% contamination were kept.These genomes were then dereplicated using dRep version 2.3.2 [12] using the default threshold of 99% average nucleotide identity (ANI) to create a representative set of genomes.These were classified using the GTDB Toolkit (GTDB-Tk, version 2.1.0)against the release 214 database [17].The final dataset contained 397 unique good quality K. sydneyanus-associated genomes.

Substrates estimation based on EC similarity
Genes annotated as CAZymes with similar EC functions were filtered based on their association to paths of degradation of brown and red algae polysaccharides.Each EC function was assigned a substrate according to their CAZy annotation and the literature.

Mannitol genes identification
An initial keyword search across UNIPROT, Uniref100, KEGG and Pfam annotations was performed to identify genes associated with mannitol utilization pathways [1].Next, these genes were inspected using a five gene sliding window to identify potential mannitol operons containing a minimum of two genes, in which, at least one gene contained the keyword "mannitol" in its annotation.A last filter considered their protein domain annotations to estimate their probable gene function.The keywords used in this search are detailed in the Table S6.

CGC classification and substrate assignment
In order to assign CGCs to a substrate, only CGCs containing two degradative (GH, PL or CE) CAZy families were considered.These CGCs were inspected manually considering the CAZyme content of the CGC, their similarity with EC functions, and their CAZy annotation.A substrate was assigned if the arrangement of CAZy families were coherent to pathways of degradation of alginate, FCSP, laminarin, carrageenan, starch and agarose (Table S8).Further, CAZymes present within classified CGCs were used as templates to identify non-CGC associated CAZymes that contained similar features (e.g., CAZy family, EC similarity, Pfam and TIGRfam annotations).Features frequently occurring within these CGCs (top third most frequent annotation), were used to filter genes associated with alginate, laminarin, FCSP and carrageenan pathways (curated genes are shown in Table S11).show the relative abundance (%) of each co-abundant group (A1-B4) across individual fish (G117, G121, G124, G125) and gut section (IV, V).Bar colours denote class.
Research, Irvine, California, United States) by Otago Genomics.Sequencing was performed using three NextSeq 2000 P3-200 (Illumina, San Diego, CA, United States) flow cells outputting a minimum of 110 Gb per sample of 2 x 100bp PE reads.
interquartile range and median CAZyme density across MAG in the class, and the whiskers represent the minimum and maximum values within 1.5 times the interquartile range.(C) Boxplots displaying CAZyme density per class omitting MAGs recovered from section III in 2020.

Figure
Figure S2 Gene expression of bacterial classes across sample.(A) Bar plots showing overall gene expression by class per sample in gut sections IV and V. Bars are coloured according to class.Y-axis show the summed expression per sample of each class (labels on the left) and x-axis indicate the sample (fish identifier and gut section).TPM = transcripts per million.(B) Stacked bar plots showing the summed expression of CGC genes per class across samples.

Figure
Figure S3 Longitudinal distributions of co-abundant taxa.(A) Line plot displaying the summed relative abundance of each co-abundant group in sections IV and V (x-axis and outer labels), and their Log10 fold changes from section IV to V (y-axis).Shading denotes section relative abundance in sections IV (pink) and V (dark green).(A) Stacked bar plots